Pay-as-you-go Feedback in Data Quality Systems

نویسندگان

  • Romila Pradhan
  • Siarhei Bykau
  • Sunil Prabhakar
چکیده

In many domains such as the web, sensor networks and social media, sources often provide conflicting information. It is of utmost importance to resolve conflicts and identify correct information. A number of approaches, referred to as truth finders, have been proposed recently. They address the problem of truth discovery using different principles such as link analysis, Bayesian modeling and reputation systems. None of the existing approaches, however, leverages user feedback to improve the performance of these truth finders. In the present work, we propose a novel framework based on the concept of the value of perfect information that orders existing conflicts by their ability to boost the collective performance of the truth finder on all objects. We devise a number of algorithms that take into account the voting network structure and the level of agreement/disagreement among sources, and produce effective orderings of objects for validation with interactive response rates. Finally, we present an extensive experimental evaluation where we show that our solution outperforms existing truth finders, and also study the trade-offs between the efficiency and effectiveness of the various ordering algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pay-as-you-go Data Integration: Experiences and Recurring Themes

Data integration typically seeks to provide the illusion that data from multiple distributed sources comes from a single, well managed source. Providing this illusion in practice tends to involve the design of a global schema that captures the users data requirements, followed by manual (with tool support) construction of mappings between sources and the global schema. This overall approach can...

متن کامل

Financing Long-term Care: Some Ideas From Switzerland; Comment on “Financing Long-term Care: Lessons From Japan”

Ikegami reviews the implementation of mandatory long-term care insurance systems in Germany and Japan, which are organized as pay-as-you-go systems. I propose to go one step further and implement a multi-pillar, mandatory and voluntary long-term care financing system, which combines pay-as-you-go with capital-funded elements. The proposal is based on the observation tha...

متن کامل

A pay-as-you-go framework for query execution feedback

Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state-of-the-art approach for obtaining execution feedback is “passive” monitoring which records the cardinality of each operator in the execution plan. We observe that there are many cases where even after repeated executi...

متن کامل

Pay-as-you-go Configuration of Entity Resolution

Entity resolution, which seeks to identify records that represent the same entity, is an important step in many data integration and data cleaning applications. However, entity resolution is challenging both in terms of scalability (all-against-all comparisons are computationally impractical) and result quality (syntactic evidence on record equivalence is often equivocal). As a result, end-to-e...

متن کامل

Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems

Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style HTML tables can be extracted from the Web. Traditional data integration approaches do not scale over such corpora with hundreds of small tables in one domain. To solve this problem, previous work has proposed pay-as-you-...

متن کامل

Efficient Feedback Collection for Pay-as-you-go Source Selection

Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the origin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015